Subword Variation in Text Message Classification

نویسندگان

Robert Munro

Christopher D. Manning

چکیده

For millions of people in less resourced regions of the world, text messages (SMS) provide the only regular contact with their doctor. Classifying messages by medical labels supports rapid responses to emergencies, the early identification of epidemics and everyday administration, but challenges include textbrevity, rich morphology, phonological variation, and limited training data. We present a novel system that addresses these, working with a clinic in rural Malawi and texts in the Chichewa language. We show that modeling morphological and phonological variation leads to a substantial average gain of F=0.206 and an error reduction of up to 63.8% for specific labels, relative to a baseline system optimized over word-sequences. By comparison, there is no significant gain when applying the same system to the English translations of the same texts/labels, emphasizing the need for subword modeling in many languages. Language independent morphological models perform as accurately as language specific models, indicating a broad deployment potential.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Investigation of Subword Unit Representations for Spoken Document Retrieval

This study investigates the feasibility of using subword unit representations for spoken document retrieval as an alternative to using words generated by either keyword spotting or word recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recogn...

متن کامل

Subword and Spatiotemporal Models for Identifying Actionable Information in {H}aitian {K}reyol

Crisis-affected populations are often able to maintain digital communications but in a sudden-onset crisis any aid organizations will have the least free resources to process such communications. Information that aid agencies can actually act on, ‘actionable’ information, will be sparse so there is great potential to (semi)automatically identify actionable communications. However, there are hur...

متن کامل

Semantic Prosody: Its Knowledge and Appropriate Selection of Equivalents

In translation, choosing appropriate equivalent is essential to convey the right message from source-text to target-text, and one of the issues that may have a determinative role in appropriate equivalent choice is the semantic prosody (SP) behavior of words and the relation existing between the SP of a word and semantic senses (i.e. negativity, positivity or neutrality) of its collocations in ...

متن کامل

Using machine learning method and subword unit representations for spoken document categorization

In this paper, we investigate the feasibility of using machine learning method and subword units for spoken document categorization as an alternative to using words generated by word recognition or keyword spotting. An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken document a...

متن کامل

Merging search spaces for subword spoken term detection

We describe how complementary search spaces, addressed by two different methods used in Spoken Term Detection (STD), can be merged for German subword STD. We propose fuzzysearch techniques on lattices to narrow the gap between subword and word retrieval. The first technique is based on an edit-distance, where no a priori knowledge about confusions is employed. Additionally, we propose a weighti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Subword Variation in Text Message Classification

نویسندگان

چکیده

منابع مشابه

An Investigation of Subword Unit Representations for Spoken Document Retrieval

Subword and Spatiotemporal Models for Identifying Actionable Information in {H}aitian {K}reyol

Semantic Prosody: Its Knowledge and Appropriate Selection of Equivalents

Using machine learning method and subword unit representations for spoken document categorization

Merging search spaces for subword spoken term detection

عنوان ژورنال:

اشتراک گذاری